Interaction trap systems for detecting protein interactions

ABSTRACT

Disclosed herein is a method of determining whether a first protein is capable of physically interacting with a second protein, involving: (a) providing a host cell which contains (i) a reporter gene operably linked to a protein binding site; (ii) a first fusion gene which expresses a first fusion protein, the first fusion protein including the first protein covalently bonded to a binding moiety which is capable of specifically binding to the protein binding site; and (iii) a second fusion gene which expresses a second fusion protein, the second fusion protein including the second protein covalently bonded to a gene activating moiety and being conformationally-constrained; and (b) measuring expression of the reporter gene as a measure of an interaction between the first and the second proteins. Also disclosed are methods for assaying protein interactions, and identifying antagonists and agonists of protein interactions. Proteins isolated by these methods are also discussed. Finally, populations of eukaryotic cells are disclosed, each cell having a recombinant DNA molecule encoding a conformationally-constrained intracellular peptide.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation-in-part of U.S. Ser. No. 08/504,538,filed Jul. 20, 1995, which is a continuation-in-part of U.S. Ser. No.08/278,082, filed Jul. 20, 1994.

BACKGROUND OF THE INVENTION

This invention relates to methods for detecting protein interactions andisolating novel proteins.

SUMMARY OF THE INVENTION

In general, the invention features methods for detecting interactionsamong proteins.

Accordingly, in one aspect, the invention features a method ofdetermining whether a first protein is capable of physically interactingwith a second protein. The method includes (a) providing a host cellwhich contains (i) a reporter gene operably linked to aDNA-binding-protein recognition site; (ii) a first fusion gene whichexpresses a first fusion protein, the first fusion protein comprisingthe first protein covalently bonded to a binding moiety which is capableof specifically binding to the DNA-binding-protein recognition site; and(iii) a second fusion gene which expresses a second fusion protein, thesecond fusion protein including the second protein covalently bonded toa gene activating moiety and being conformationally-constrained; and (b)measuring expression of the reporter gene as a measure of an interactionbetween the first and said second proteins.

Preferably, the second protein is a short peptide of at least 6 aminoacids in length and is less than or equal to 60 amino acids in length;includes a randomly generated or intentionally designed peptidesequence; includes one or more loops; or is conformationally-constrainedas a result of covalent bonding to a conformation-constraining protein,e.g., thioredoxin or a thioredoxin-like molecule. Where the secondprotein is covalently bonded to a conformationally constraining proteinthe invention features a polypeptide wherein the second protein isembedded within the conformation-constraining protein to which it iscovalently bonded. Where the conformation-constraining protein isthioredoxin, the invention also features an additional method whichincludes a second protein which is conformationally-constrained bydisulfide bonds between cysteine residues in the amino-terminus and inthe carboxy-terminus of the second protein.

In another aspect, the invention features a method of detecting aninteracting protein in a population of proteins, comprising: (a)providing a host cell which contains (i) a reporter gene operably linkedto a DNA-binding-protein recognition site; and (ii) a fusion gene whichexpresses a fusion protein, the fusion protein including a test proteincovalently bonded to a binding moiety which is capable of specificallybinding to the DNA-binding-protein recognition site; (b) introducinginto the host cell a second fusion gene which expresses a second fusionprotein, the second fusion protein including one of said population ofproteins covalently bonded to a gene activating moiety and beingconformationally-constrained; and (c) measuring expression of thereporter gene. Preferably, the population of proteins includes shortpeptides of between 1 and 60 amino acids in length.

The invention also features a method of detecting an interacting proteinwithin a population wherein the population of proteins is a set ofrandomly generated or intentionally designed peptide sequences, or wherethe population of proteins is conformationally-constrained by covalentlybonding to a conformation-constraining protein. Preferably, where thepopulation of proteins is conformationally-constrained by covalentbonding to a conformation-constraining protein, the population ofproteins is embedded within the conformation-constraining protein. Theinvention further features a method of detecting an interacting proteinwithin a population wherein the conformation-constraining protein isthioredoxin. Preferably, the population of proteins is inserted into theactive site loop of the thioredoxin.

The invention further features a method wherein each of the populationof proteins is conformationally-constrained by disulfide bonds betweencysteine residues in the amino-terminus and in the carboxy-terminus ofsaid protein.

In preferred embodiments of various aspects, the host cell is yeast; theDNA binding domain is LexA; the interacting protein includes one or moreloops; and/or the reporter gene is assayed by a color reaction or bycell viability.

In other embodiments the bait may be Cdk2 or a Ras protein sequence.

In another related aspect, the invention features a method ofidentifying a candidate interactor. The method includes (a) providing areporter gene operably linked to a DNA-binding-protein recognition site;(b) providing a first fusion protein, which includes a first proteincovalently bonded to a binding moiety which is capable of specificallybinding to the DNA-binding-protein recognition site; (c) providing asecond fusion protein, which includes a second protein covalently bondedto a gene activating moiety and being conformationally-constrained, thesecond protein being capable of interacting with said first protein; (d)contacting said candidate interactor with said first protein and/or saidsecond protein; and (e) measuring expression of said reporter gene.

The invention features a method of identifying a candidate interactorwherein the first fusion protein is provided by providing a first fusiongene which expresses the first fusion protein and wherein the secondfusion protein is provided by providing a second fusion gene whichexpresses said second fusion protein. Alternatively, the reporter gene,the first fusion gene, and the second fusion gene are included on asingle piece of DNA.

The invention also features a method of identifying candidateinteractors wherein the first fusion protein and the second fusionprotein are permitted to interact prior to contact with said candidateinteractor, and a related method wherein the first fusion protein andthe candidate interactor are permitted to interact prior to contact withsaid second fusion protein.

In a preferred embodiment, the candidate interactor isconformationally-constrained and may include one or more loops. Wherethe candidate interactor is an antagonist, reporter gene expression isreduced. Where the candidate-interactor is an agonist, reporter geneexpression is increased. The candidate interactor is a member selectedfrom the group consisting of proteins, polynucleotides, and smallmolecules. In addition, a candidate interactor can be encoded by amember of a cDNA or synthetic DNA library. Moreover, the candidateinteractor can be a mutated form of said first fusion protein or saidsecond fusion protein.

In a preferred embodiment of any of the above aspects, the candidateinteractor is isolated in vitro and shown to function in vivo, i.e., asa conformationally constrained intracellular peptide.

In a related aspect, the invention features a population of eukaryoticcells, each cell having a recombinant DNA molecule encoding aconformationally-constrained intracellular peptide, there being at least100 different recombinant molecules in the population, each moleculebeing in at least one cell of said population.

Preferably, the intracellular peptides within the population of cellsare conformationally-constrained because they are covalently bonded to aconformation-constraining protein.

In preferred embodiments the intracellular peptide is embedded withinthe conformation-constraining protein, preferably thioredoxin; theintracellular peptide is conformationally-constrained by disulfide bondsbetween cysteine residues in the amino-terminus and in thecarboxy-terminus of said second protein; the intracellular peptideincludes one or more loops; the population of eukaryotic cells are yeastcells; the recombinant DNA molecule further encodes a gene activatingmoiety covalently bonded to said intracellular peptide; and/or theintracellular peptide physically interacts with a second recombinantprotein inside said eukaryotic cells.

In another aspect, the invention features a method of assaying aninteraction between a first protein and a second protein. The methodincludes: (a) providing a reporter gene operably linked to aDNA-binding-protein recognition site; (b) providing a first fusionprotein including a first protein covalently bonded to a binding moietywhich is capable of specifically binding to the DNA-binding-proteinrecognition site; (c) providing a second fusion-protein including asecond protein which is conformationally constrained (and may includeone or more loops) and is covalently bonded to a gene activating moiety;(d) combining the reporter gene, the first fusion protein, and thesecond fusion protein; and (e) measuring expression of the reportergene.

In a preferred embodiment, the invention further features a method ofassaying the interaction between two proteins wherein the first fusionprotein is provided by providing a first fusion gene which expresses thefirst fusion protein and wherein the second fusion protein is providedby providing a second fusion gene which expresses the second fusionprotein. In another preferred embodiment, the interaction is assayed invitro and shown to function in vivo, i.e., as a conformationallyconstrained intracellular peptide.

In yet other aspects, the invention features a protein including thesequenceLeu-Val-Cys-Lys-Ser-Tyr-Arg-Leu-Asp-Trp-Glu-Ala-Gly-Ala-Leu-Phe-Arg-Ser-Leu-Phe(SEQ ID NO: 1), preferably conformationally-constrained; proteinincluding the sequenceMet-Val-Val-Ala-Ala-Glu-Ala-Val-Arg-Thr-Val-Leu-Leu-Ala-Asp-Gly-Gly-Asp-Val-Thr(SEQ ID NO: 2); preferably conformationally-constrained; a proteinincluding the sequencePro-Asn-Trp-Pro-His-Gln-Leu-Arg-Val-Gly-Arg-Val-Leu-Trp-Glu-Arg-Leu-Ser-Phe-Glu(SEQ ID NO: 3), preferably conformationally-constrained; a proteinincluding the sequenceSer-Val-Arg-Met-Arg-Tyr-Gly-Ile-Asp-Ala-Phe-Phe-Asp-Leu-Gly-Gly-Leu-Leu-His-Gly(SEQ ID NO: 9), preferably conformationally-constrained; a proteinincluding the sequenceGlu-Leu-Arg-His-Arg-Leu-Gly-Arg-Ala-Leu-Ser-Glu-Asp-Met-Val-Arg-Gly-Leu-Ala-Trp-Gly-Pro-Thr-Ser-His-Cys-Ala-Thr-Val-Pro-Gly-Thr-Ser-Asp-Leu-Trp-Arg-Val-Ile-Arg-Phe-Leu(SEQ ID NO: 10), preferably conformationally-constrained; a proteinincluding the sequenceTyr-Ser-Phe-Val-His-His-Gly-Phe-Phe-Asn-Phe-Arg-Val-Ser-Trp-Arg-Glu-Met-Leu-Ala(SEQ ID NO: 11), preferably conformationally-constrained; a proteinincluding the sequenceGln-Val-Trp-Ser-Leu-Trp-Ala-Leu-Gly-Trp-Arg-Trp-Leu-Arg-Arg-Tyr-Gly-Trp-Asn-Met(SEQ ID NO: 12), preferably conformationally-constrained; a proteinincluding the sequenceTrp-Arg-Arg-Met-Glu-Leu-Asp-Ala-Glu-Ile-Arg-Trp-Val-Lys-Pro-Ile-Ser-Pro-Leu-Glu(SEQ ID NO: 13), preferably conformationally-constrained; a proteinincluding the sequenceTrp-Ala-Glu-Trp-Cys-Gly-Pro-Val-Cys-Ala-His-Gly-Ser-Arg-Ser-Leu-Thr-Leu-Leu-Thr-Lys-Tyr-His-Val-Ser-Phe-Leu-Gly-Pro-Cys-Lys-Met-Ile-Ala-Pro-Ile-Leu-Asp(SEQ ID NO:17), preferably conformationally-constrained; a proteinincluding the sequenceLeu-Val-Cys-Lys-Ser-Tyr-Arg-Leu-Asp-Trp-Glu-Ala-Gly-Ala-Leu-Phe-Arg-Ser-Leu-Phe(SEQ ID NO: 18), preferably conformationally-constrained; a proteinincluding the sequenceTyr-Arg-Trp-Gln-Gln-Gly-Val-Val-Pro-Ser-Asn-Trp-Ala-Ser-Cys-Ser-Phe-Arg-Cys-Gly(SEQ ID NO: 19), preferably conformationally-constrained; a proteinincluding the sequenceSer-Ser-Phe-Ser-Leu-Trp-Leu-Leu-Met-Val-Lys-Ser-Ile-Lys-Arg-Ala-Ala-Trp-Glu-Leu-Gly-Pro-Ser-Ser-Ala-Trp-Asn-Thr-Ser-Gly-Trp-Ala-Ser-Leu-Ala-Asp-Phe-Tyr(SEQ ID NO: 20) preferably conformationally-constrained; a proteinincluding the sequenceArg-Val-Lys-Leu-Gly-Tyr-Ser-Phe-Trp-Ala-Gln-Ser-Leu-Leu-Arg-Cys-Ile-Ser-Val-Gly(SEQ ID NO: 21), preferably conformationally-constrained; a proteinincluding the sequenceGln-Leu-Tyr-Ala-Gly-Cys-Tyr-Leu-Gly-Val-Val-Ile-Ala-Ser-Ser-Leu-Ser-Ile-Arg-Val(SEQ ID NO: 22), preferably conformationally-constrained; a proteinincluding the sequenceGln-Gln-Arg-Phe-Val-Phe-Ser-Pro-Ser-Trp-Phe-Thr-Cys-Ala-Gly-Thr-Ser-Asp-Phe-Trp-Gly-Pro-Glu-Pro-Leu-Phe-Asp-Trp-Thr-Arg-Asp(SEQ ID NO: 23), preferably conformationally-constrained; a proteinincluding the sequenceArg-Pro-Leu-Thr-Gly-Arg-Trp-Val-Val-Trp-Gly-Arg-Arg-His-Glu-Glu-Cys-Gly-Leu-Thr(SEQ ID NO: 24), preferably conformationally-constrained; a proteinincluding the sequencePro-Val-Cys-Cys-Met-Met-Tyr-Gly-His-Arg-Thr-Ala-Pro-His-Ser-Val-Phe-Asn-Val-Asp(SEQ ID NO: 25), preferably conformationally-constrained; a proteinincluding the sequenceTrp-Ser-Pro-Glu-Leu-Leu-Arg-Ala-Met-Val-Ala-Phe-Arg-Trp-Leu-Leu-Glu-Arg-Arg-Pro(SEQ ID NO: 26); and substantially pure DNA encoding the immediatelyforegoing proteins.

The invention also includes novel proteins and other candidateinteractors identified by the foregoing methods. It will be appreciatedthat these proteins and candidate interactors may either increase ordecrease reporter gene activity and that these changes in activity maybe measured using assays described herein or known in the art. Alsoincluded in the invention are methods for using conformationallyconstrained interactor proteins. For example, the conformationallyconstrained proteins of the invention may be used as reagents in assaysfor protein detection that involve formation of a complex between theconformationally constrained protein and a protein of interest to whichit specifically binds, followed by complex detection (for example, by animmunoprecipitation, Western blot, or affinity column technique thatutilizes the conformationally constrained protein as the complex-formingreagent).

Finally, the invention features a method of assaying an interactionbetween a first protein and a second protein, involving: (a) providingthe first protein; (b) providing a fusion protein including the secondprotein, the second protein being conformationally-constrained; (c)contacting the first protein with the fusion protein under conditionswhich allow complex formation; (d) detecting the complex as anindication of an interaction; and (e) determining whether the firstprotein interacts with the fusion protein inside a cell.

As used herein, by “reporter gene” is meant a gene whose expression maybe assayed; such genes include, without limitation, lacZ, amino acidbiosynthetic genes, e.g. the yeast LEU2, HIS3, LYS2, TRP1, or URA3genes, nucleic acid biosynthetic genes, the mammalian chloramphenicoltransacetylase (CAT) gene, or any surface antigen gene for whichspecific antibodies are available. Reporter genes may encode any proteinthat provides a phenotypic marker, for example, a protein that isnecessary for cell growth or a toxic protein leading to cell death, ormay encode a protein detectable by a color assay leading to the presenceor absence of color (e.g., florescent proteins and derivatives thereof).Alternatively, a reporter gene may encode a suppressor tRNA, theexpression of which produces a phenotype that can be assayed. A reportergene according to the invention includes elements (e.g., all promoterelements) necessary for reporter gene function.

By “operably linked” is meant that a gene and a regulatory sequence(s)are connected in such a way as to permit gene expression when theappropriate molecules (e.g., transcriptional activator proteins orproteins which include transcriptional activation domains) are bound tothe regulatory sequence(s).

By “covalently bonded” is meant that two domains are joined by covalentbonds, directly or indirectly. That is, the “covalently bonded” proteinsor protein moieties may be immediately contiguous or may be separated bystretches of one or more amino acids within the same fusion protein.

By “providing” is meant introducing the fusion proteins into theinteraction system sequentially or simultaneously, and directly (asproteins) or indirectly (as genes encoding those proteins).

By “protein” is meant a sequence of amino acids of any length,constituting all or a part of a naturally-occurring polypeptide orpeptide, or constituting a non-naturally-occurring polypeptide orpeptide (e.g., a randomly generated peptide sequence or one of anintentionally designed collection of peptide sequences).

By a “binding moiety” is meant a stretch of amino acids which is capableof directing specific polypeptide binding to a particular DNA sequence(i.e., a “DNA-binding-protein recognition site”).

By “weak gene activating moiety” is meant a stretch of amino acids whichis capable of weakly inducing the expression of a gene to whose controlregion it is bound. As used herein, “weakly” is meant below the level ofactivation effected by GAL4 activation region II (Ma and Ptashne, Cell48:847, 1987) and is preferably at or below the level of activationeffected by the B112 activation domain of Ma and Ptashne (Cell 51:113,1987). Levels of activation may be measured using any downstreamreporter gene system and comparing, in parallel assays, the level ofexpression stimulated by the GAL4 region II-polypeptide with the levelof expression stimulated by the polypeptide to be tested.

By “altering the expression of the reporter gene” is meant an increaseor decrease in the expression of the reporter gene to the extentrequired for detection of a change in the assay being employed. It willbe appreciated that the degree of change will vary depending upon thetype of reporter gene construct or reporter gene expression assay beingemployed.

By “conformationally-constrained” is meant a protein that has reducedstructural flexibility because its amino and carboxy termini are fixedin space. As a result of this constraint, the protein may form “loops”(i.e., regions of amino acids of any shape which extend away from theconstrained amino and carboxy termini). Preferably, theconformationally-constrained protein is displayed in a structurallyrigid manner. Conformational constraint according to the invention maybe brought about by exploiting the disulfide-bonding ability of anatural or recombinantly-introduced pair of cysteine residues, oneresiding at or near the amino-terminal end of the protein of interestand the other at or near the carboxy-terminal end. Alternatively,conformational constraint may be facilitated by embedding the protein ofinterest within a conformation-constraining protein.

By “conformation-constraining protein” is meant any peptide orpolypeptide which is capable of reducing the flexibility of anotherprotein's amino and/or carboxy termini. Preferably, such proteinsprovide a rigid scaffold or platform for the protein of interest. Inaddition, such proteins preferably are capable of providing protectionfrom proteolytic degradation and the like, and/or are capable ofenhancing solubility. Examples of conformation-constraining proteinsinclude thioredoxin and other thioredoxin-like proteins, nucleases(e.g., RNase A), proteases (e.g., trypsin), protease inhibitors (e.g.,bovine pancreatic trypsin inhibitor), antibodies or structurally-rigidfragments thereof, conotoxins, and the pleckstrin homology domain. Aconformation-constraining peptide can be of any appropriate length andcan even be a single amino acid residue.

“Thioredoxin-like proteins” are defined herein as amino acid sequencessubstantially similar, e.g., having at least 18% homology, with theamino acid sequence of E. coli thioredoxin over an amino acid sequencelength of 80 amino acids. Alternatively, a thioredoxin-like DNA sequenceis defined herein as a DNA sequence encoding a protein or fragment of aprotein characterized by having a three dimensional structuresubstantially similar to that of human or E. coli thioredoxin, e.g.,glutaredoxin and optionally by containing an active-site loop. The DNAsequence of glutaredoxin is an example of a thioredoxin-like DNAsequence which encodes a protein that exhibits such substantialsimilarity in three-dimensional conformation and contains a Cys . . .Cys active-site loop. The amino acid sequence of E. coli thioredoxin isdescribed in Eklund et al., EMBO J. 3:1443-1449 (1984). Thethree-dimensional structure of E. coli thioredoxin is depicted in FIG. 2of Holmgren, J. Biol. Chem. 264:13963-13966 (1989). A DNA sequenceencoding the E. coli thioredoxin protein is set forth in Lim et al., J.Bacteriol., 163:311-316 (1985). The three dimensional structure of humanthioredoxin is described in Forman-Kay et al., Biochemistry 30:2685-98(1991). A comparison of the three dimensional structures of E. colithioredoxin and glutaredoxin is published in Xia, Protein ScienceI:310-321 (1992). These four publications are incorporated herein byreference for the purpose of providing information on thioredoxin-likeproteins that is known to one of skill in the art. Examples ofthioredoxin-like proteins are described herein.

By “candidate interactors” is meant proteins (“candidate interactingproteins”) or compounds which physically interact with a protein ofinterest; this term also encompasses agonists and antagonists. Agonistinteractors are identified as compounds or proteins that have theability to increase reporter gene expression mediated by a pair ofinteracting proteins. Antagonist interactors are identified as compoundsor proteins that have the ability to decrease reporter gene expressionmediated by a pair of interacting proteins. Candidate interactors alsoinclude so-called peptide “aptamers” which specifically recognize targetproteins and may be used in a manner analogous to antibody reagents;such aptamers may include one or more loops.

“Compounds” include small molecules, generally under 1000 MW,carbohydrates, polynucleotides, lipids, and the like.

By “test protein” is meant one of a pair of interacting proteins, theother member of the pair generally referred to as a “candidateinteractor” (supra).

By “randomly generated” is meant sequences having no predeterminedsequence; this is contrasted with “intentionally designed” sequenceswhich have a DNA or protein sequence or motif determined prior to theirsynthesis.

By “mutated” is meant altered in sequence, either by site-directed orrandom mutagenesis. A mutated form of a protein encompasses pointmutations as well as insertions, deletions, or rearrangements.

By “intracellular” is meant that the peptide is localized inside thecell, rather than on the cell surface.

By an “activated Ras” is meant any mutated form of Ras which remainsbound to GTP for a period of time longer than that exhibited by thecorresponding wild-type form of the protein. By “Ras” is meant any formof Ras protein including, without limitation, N-ras, K-ras, and H-ras.

The interaction trap systems described herein provide advantages overmore conventional methods for isolating interacting proteins or genesencoding interacting proteins. For example, applicants' systems providerapid and inexpensive methods having very general utility foridentifying and purifying genes encoding a wide range of useful proteinsbased on the protein's physical interaction with a second polypeptide.This general utility derives in part from the fact that the componentsof the systems can be readily modified to facilitate detection ofprotein interactions of widely varying affinity (e.g., by using reportergenes which differ quantitatively in their sensitivity to a proteininteraction). The inducible nature of the promoter used to express theinteracting proteins also increases the scope of candidate interactorswhich may be detected since even proteins whose chronic expression istoxic to the host cell may be isolated simply by inducing a short burstof the protein's expression and testing for its ability to interact andstimulate expression of a reporter gene.

If desired, detection of interacting proteins may be accomplishedthrough the use of weak gene activation domain tags. This approachavoids restrictions on the pool of available candidate interactingproteins which may be associated with stronger activation domains (suchas GAL4 or VP16); although the mechanism is unclear, such a restrictionapparently results from low to moderate levels of host cell toxicitymediated by the strong activation domain.

In addition, the claimed methods make use ofconformationally-constrained proteins (i.e., proteins with reducedflexibility due to constraints at their amino and carboxy termini).Conformational constraint may be brought about by embedding the proteinof interest within a conformation-constraining protein (i.e., a proteinof appropriate length and amino acid composition to be capable oflocking the candidate interacting protein into a particularthree-dimensional structure). Examples of conformation-constrainingproteins include, but are not limited to, thioredoxin (or otherthioredoxin-like proteins), nucleases (e.g., RNase A), proteases (e.g.,trypsin), protease inhibitors (e.g., bovine pancreatic trypsininhibitor), antibodies or structurally-rigid fragments thereof,conotoxins, and the pleckstrin homology domain.

Alternatively, conformational constraint may be accomplished byexploiting the disulfide-bonding ability of a natural orrecombinantly-introduced pair of cysteine residues, one residing at theamino terminus of the protein of interest and the other at its carboxyterminus. Such disulfide bonding locks the protein into a rigid andtherefore conformationally-constrained loop structure. Disulfide bondsbetween amino-terminal and carboxy-terminal cysteines may be formed, forexample, in the cytoplasm of E. coli trxB mutant strains. Under someconditions disulfide bonds may also form within the cytoplasm andnucleus of higher organisms harboring equivalent mutations, for example,an S. cerevisiae YTR4⁻ mutant strain (Furter et al., Nucl Acids Res.14:6357-6373, 1986; GenBank Accession Number P29509). In addition, thethioredoxin fusions described herein (trxA fusions) are amenable to thisalternative means of introducing conformational constraint, since thecysteines at the base of peptides inserted within the thioredoxinactive-site loop are at a proper distance from one another to formdisulfide bonds under appropriate conditions.

Conformationally-constrained proteins as candidate interactors areuseful in the invention because they are amenable to tertiary structuralanalysis, thus facilitating the design of simple organic moleculemimetics with improved pharmacological properties. For example, becausethioredoxin has a known structure, the protein structure between theconformationally constrained regions may be more easily solved usingmethods such as NMR and X-ray difference analysis. Certainconformation-constraining proteins also protect the embedded proteinfrom cellular degradation and/or increase the protein's solubility,and/or otherwise alter the capacity of the candidate interactor tointeract.

Once isolated, interacting proteins can also be analyzed using theinteraction trap system, with the signal generated by the interactionbeing an indication of any change in the proteins' interactioncapabilities. In one particular example, an alteration is made (e.g., bystandard in vivo or in vitro directed or random mutagenesis procedures)to one or both of the interacting proteins, and the effect of thealteration(s) is monitored by measuring reporter gene expression. Usingthis technique, interacting proteins with increased or decreasedinteraction potential are isolated. Such proteins are useful astherapeutic molecules (for example, agonists or antagonists) or, asdescribed above, as models for the design of simple organic moleculemimetics.

Protein agonists and antagonists may also be readily identified andisolated using a variation of the interaction trap system. Inparticular, once a protein-protein interaction has been recorded, anadditional DNA coding for a candidate agonist or antagonist, orpreferably, one of a library of potential agonist- orantagonist-encoding sequences is introduced into the host cell, andreporter gene expression is measured. Alternatively, candidateinteractor agonist or antagonist compounds (i.e., including polypeptidesas well as non-proteinaceous compounds, e.g., single strandedpolynucleotides) are introduced into an in vivo or in vitro interactiontrap system according to the invention and their ability to effectreporter gene expression is measured. A decrease in reporter geneexpression (compared to a control lacking the candidate sequence orcompound) indicates an antagonist. Conversely, an increase in reportergene expression (compared again to a control) indicates an agonist.Interaction agonists and antagonists are useful as therapeutic agents oras models to design simple mimetics; if desired, an agonist orantagonist protein may be conformationally-constrained to provide theadvantages described herein. Particular examples of interacting proteinsfor which antagonists or agonists may be identified include, but are notlimited to, the IL-6 receptor-ligand pair, TGF-β receptor-ligand pair,IL-1 receptor-ligand pair and other receptor-ligand interactions,protein kinase-substrate pairs, interacting pairs of transcriptionfactors, interacting components of signal transduction pathways (forexample, cytoplasmic domains of certain receptors and G-proteins), pairsof interacting proteins involved in cell cycle regulation (for example,p16 and CDK4), and neurotransmitter pairs.

Also included in the present invention are libraries encodingconformationally-constrained proteins. Such libraries (which may includenatural as well as synthetic DNA sequence collections) are expressedintracellularly or, optionally, in cell-free systems, and may be usedtogether with any standard genetic selection or screen or with any of anumber of interaction trap formats for the identification of interactingproteins, agonist or antagonist proteins, or proteins that endow a cellwith any identifiable characteristic, for example, proteins that perturbcell cycle progression. Accordingly, peptide-encoding libraries (eitherrandom or designed) can be used in selections or screens which eitherare or are not transcriptionally-based. These libraries (whichpreferably include at least 100 different peptide-encoding species andmore preferably include 1000, or 100,000 or greater individual species)may be transformed into any useful prokaryotic or eukaryotic host, withyeast representing the preferred host. Alternatively, suchpeptide-encoding libraries may be expressed in cell-free systems.

Other features and advantages of the invention will be apparent from thefollowing detailed description thereof, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings are first briefly described.

FIGS. 1A-1C illustrate one interaction trap system according to theinvention.

FIG. 2 is a diagram of a library vector pJM1.

FIG. 3A is a photograph showing the interaction of peptide aptamers withother proteins.

FIG. 3B illustrates the sequence of exemplary Cdk2 interacting peptides.

FIG. 4A is a photograph showing the interaction of peptide aptamers withother proteins. The designations of these peptide aptamers differ fromthose shown in FIG. 3A, and corresponds to the numbering shown in FIG.4B. To carry out these experiments, yeast strain EGY48 was transformedwith either a plasmid expressing an anti-Cdk2 aptamer or with a plasmidexpressing a control 20-mer peptide loop, and the strain was then matedto different bait strains as described in Finley et al. (Proc. Natl.Acad. Sci. U.S.A. 91:12980-12984 (1994)).

FIG. 4B illustrates the sequence of the exemplary Cdk2 interactingpeptide aptamers assayed in FIG. 4A.

FIG. 5 illustrates coprecipitation of peptides 3 and 13 by Gst-Cdk2.Lane 1. Gst Beads, extract contains TrxA;

Lane 2. Gst Beads, extract contains TrxA-peptide 3;Lane 3. Gst Beads, extract contain TrxA-peptide 13;Lane 4. Gst-Cdk2 beads, extract contains TrxA;Lane 5. Gst-Cdk2 beads, extract contains TrxA-peptide 3; andLane 6. Gst-Cdk2, extract contains TrxA-peptide 13.

FIG. 6 illustrates coprecipitation of the peptide aptamers of FIGS. 4Aand 4B.

FIG. 7 illustrates a representative binding affinity graph producedusing an evanescent wave instrument.

FIG. 8 illustrates the ability of exemplary peptide aptamers of FIGS. 4Aand 4B to inhibit phosphorylation of Histone H1 by Cdk2/cyclin E kinase.

FIG. 9 illustrates the vector BRM116-H-Ras(G12V).

FIG. 10 illustrates the vector pEG202-H-Ras(G12V).

DETAILED DESCRIPTION

Applicants have developed a novel interaction trap system for theidentification and analysis of conformationally-constrained proteinsthat either physically interact with a second protein of interest orthat antagonize or agonize such an interaction. In one embodiment, thesystem involves a eukaryotic host strain (e.g., a yeast strain) which isengineered to produce a protein of therapeutic or diagnostic interest asa fusion protein covalently bonded to a known DNA binding domain; thisprotein is referred to as a “bait” protein because its purpose in thesystem is to “catch” useful, but as yet unknown or uncharacterized,interacting polypeptides (termed the “prey”; see below). The eukaryotichost strain also contains one or more “reporter genes,” i.e., geneswhose transcription is detected in response to a bait-prey interaction.Bait proteins, via their DNA binding domain, bind to their specific DNArecognition site upstream of a reporter gene; reporter transcription isnot stimulated, however, because the bait protein lacks an activationdomain.

To isolate DNA sequences encoding novel interacting proteins, members ofa DNA expression library (e.g., a cDNA or synthetic DNA library, eitherrandom or intentionally biased) are introduced into the straincontaining the reporter gene and bait protein; each member of thelibrary directs the synthesis of a candidate interacting protein fusedto an invariant gene activation domain tag. Those library-encodedproteins that physically interact with the promoter-bound bait proteinare referred to as “prey” proteins. Such bound prey proteins (via theiractivation domain tag) detectably activate the expression of thedownstream reporter gene and provide a ready assay for identifying aparticular DNA clone encoding an interacting protein of interest. In theinstant invention, each candidate prey protein isconformationally-constrained (for example, either by embedding theprotein within a conformation-constraining protein or by linkingtogether the protein's amino and carboxy termini). Such a protein ismaintained in a fixed, three-dimensional structure, facilitating mimeticdrug design.

An example of one interaction trap system according to the invention isshown in FIGS. 1A-C. FIG. 1A shows a leucine auxotroph yeast straincontaining two reporter genes, LexAop-LEU2 and LexAop-lacZ, and aconstitutively expressed bait protein gene. The bait protein (shown as apentagon) is fused to a DNA binding domain (shown as a circle). The DNAbinding protein recognizes and binds a specific DNA-binding-proteinrecognition site (shown as a solid rectangle) operably-linked to areporter gene. In FIGS. 1B and 1C, the cells additionally containcandidate prey proteins (candidate interactors) (shown as an emptyrectangle in 1B and an empty hexagon in 1C) fused to an activationdomain (shown as a solid square); each prey protein is embedded in aconformation-constraining protein (shown as two solid half circles).FIG. 1B shows that if the candidate prey protein does not interact withthe transcriptionally-inert LexA-fusion bait protein, the reporter genesare not transcribed; the cell cannot grow into a colony on leu⁻ medium,and it is white on Xgal medium because it contains no β-galactosidaseactivity. FIG. 1C shows that, if the candidate prey protein interactswith the bait, both reporter genes are active; the cell forms a colonyon leu⁻ medium, and cells in that colony have β-galactosidase activityand are blue on Xgal medium. Preferably, in this system, the baitprotein (i.e., the protein containing a site-specific DNA bindingdomain) is transcriptionally inert, and the reporter genes (which arebound by the bait protein) have essentially no basal transcription.

Each component of the system is now described in more detail.

Bait Proteins

The selection host strain depicted in FIGS. 1 A-C contains a DNAencoding a bait protein fused to a DNA encoding a DNA binding moietyderived from the bacterial LexA protein. The use of a LexA DNA bindingdomain provides certain advantages. For example, in yeast, the LexAmoiety contains no activation function and has no known effect ontranscription of yeast genes (Brent and Ptashne, Nature 312:612-615,1984; Brent and Ptashne, Cell 43:729-736, 1985). In addition, use of theLexA rather than, for example, the GAL4 DNA-binding domain allowsconditional expression of prey proteins in response to galactoseinduction; this facilitates detection of prey proteins that might betoxic to the host cell if expressed continuously. Finally, the use of awell-defined system, such as LexA, allows knowledge regarding theinteraction between LexA and the LexA binding site (i.e., the LexAoperator) to be exploited for the purpose of optimizing operatoroccupancy and/or optimizing the geometry of the bound bait protein toeffect maximal gene activation.

Preferably, the bait protein also includes a LexA dimerization domain;this optional domain facilitates efficient LexA dimer formation. BecauseLexA binds its DNA binding site as a dimer, inclusion of this domain inthe bait protein also optimizes the efficiency of operator occupancy(Golemis and Brent, Mol. Cell Biol. 12:3006-3014, (1992)).

LexA represents a preferred DNA binding domain in the invention.However, any other transcriptionally-inert or essentiallytranscriptionally-inert DNA binding domain may be used in theinteraction trap system; such DNA binding domains are well known andinclude the DNA binding portions of the proteins ACE1 (CUP1), lambda cI,lac repressor, jun, fos, GCN4, or the Tet repressor. The GAL4 DNAbinding domain represents a slightly less preferred DNA binding moietyfor the bait proteins.

Bait proteins may be chosen from any protein of interest and includesproteins of unknown, known, or suspected diagnostic, therapeutic, orpharmacological importance. Preferred bait proteins include oncoproteins(such as myc, particularly the C-terminus of myc, ras, src, fos, andparticularly the oligomeric interaction domains of fos) or any otherproteins involved in cell cycle regulation (such as kinases,phosphatases, the cytoplasmic portions of membrane-associatedreceptors). Particular examples of preferred bait proteins includecyclin and cyclin dependent kinases (for example, Cdk2) orreceptor-ligand pairs, or neurotransmitter pairs, or pairs of othersignalling proteins. In each case, the protein of interest is fused to aknown DNA binding domain as generally described herein. Examples areprovided below using Cdk2 and Ras baits.

Reporters

As shown in FIG. 1B, one preferred host strain according to theinvention contains two different reporter genes, the LEU2 gene and thelacZ gene, each carrying an upstream binding site for the bait protein.The reporter genes depicted in FIG. 1B each include, as an upstreambinding site, one or more LexA operators in place of their nativeUpstream Activation Sequences (UASs). These reporter genes may beintegrated into the chromosome or may be carried on autonomouslyreplicating plasmids (e.g., yeast 2μ plasmids).

A combination of two such reporters is preferred in the in vivoembodiments of the invention for a number of reasons. First, theLexAop-LEU2 construction allows cells that contain interacting proteinsto select themselves by growth on medium that lacks leucine,facilitating the examination of large numbers of potential candidateinteractor protein-containing cells. Second, the LexAop-lacZ reporterallows LEU⁺ cells to be quickly screened to confirm an interaction. And,third, among other technical considerations, the LexAop-LEU2 reporterprovides an extremely sensitive first selection, while the LexAop-lacZreporter allows discrimination between proteins of different interactionaffinities.

Although the reporter genes described herein represent a preferredembodiment of the invention, other equivalent genes whose expression maybe detected or assayed by standard techniques may also be employed inconjunction with, or instead of, the LEU2 and lacZ genes. Generally,such reporter genes encode an enzyme that provides a phenotypic marker,for example, a protein that is necessary for cell growth or a toxicprotein leading to cell death, or encoding a protein detectable by acolor assay or because its expression leads to the presence or absenceof color. Alternatively, the reporter gene may encode a suppressor tRNAwhose expression may be assayed, for example, because it suppresses alethal host cell mutation. Particular examples of other useful geneswhose transcription can be detected include amino acid and nucleic acidbiosynthetic genes (such as yeast HIS3, URA3, TRP1, and LYS2) GAL1, E.coli galK (which complements the yeast GAL1 gene), and the reportergenes CAT, GUS, florescent proteins and derivatives thereof, and anygene encoding a cell surface antigen for which antibodies are available(e.g., CD4). Reporter genes may be assayed by either qualitative orquantitative means to distinguish candidate interactors as agonists orantagonists.

Prey Proteins

In the selection described herein, another DNA construction is utilizedwhich encodes a series of candidate interacting proteins (i.e., preyproteins); each is conformationally-constrained, either by beingembedded in a conformation-constraining protein or because the preyprotein's amino and carboxy termini are linked (e.g., by disulfidebonding). An exemplary prey protein includes an invariant N-terminalmoiety carrying, amino to carboxy terminal, an ATG for proteinexpression, an optional nuclear localization sequence, a weak activationdomain (e.g., the B112 or B42 activation domains of Ma and Ptashne; Cell51:113, 1987), and an optional epitope tag for rapid immunologicaldetection of fusion protein synthesis. Library sequences, random orintentionally designed synthetic DNA sequences, or sequences encodingconformationally-constrained proteins, may be inserted downstream ofthis N-terminal fragment to produce fusion genes encoding prey proteins.

Prey proteins other than those described herein are also useful in theinvention. For example, cDNAs may be constructed from any mRNApopulation and inserted into an equivalent expression vector. Such alibrary of choice may be constructed de novo using commerciallyavailable kits (e.g., from Stratagene, La Jolla, Calif.) or using wellestablished preparative procedures (see, e.g., Current Protocols inMolecular Biology, New York, John Wiley & Sons, 1987). Alternatively, anumber of cDNA libraries (from a number of different organisms) arepublicly and commercially available; sources of libraries include, e.g.,Clontech (Palo Alto, Calif.) and Stratagene (La Jolla, Calif.). It isalso noted that prey proteins need not be naturally occurringfull-length polypeptides. In preferred embodiments, prey proteins areencoded by synthetic DNA sequences, are the products of randomlygenerated open reading frames, are open reading frames synthesized withan intentional sequence bias, or are portions thereof. Preferably, suchshort randomly generated sequences encode peptides between 1 (andpreferably, 6) and 60 amino acids in length. In one particular example,the prey protein includes only an interaction domain; such a domain maybe useful as a therapeutic to modulate bait protein activity (i.e., asan antagonist or agonist). In another particular example, the preyprotein contains one or more loops. Such a prey protein may be used asan immunological reagent for diagnostic purposes or for any of thetherapeutic purposes described herein; in this context, the differentloops may recognize different portions of the bait protein and mayincrease specificity. In addition, a prey protein may be a combinationof multiple interacting peptides connected in reading frame (if desired,with alternating conformation-constraining sequences) to provide afurther optimized prey protein. In one example, each of theseinteracting peptides constitutes one loop of the final interactingprotein.

Similarly, any number of activation domains may be used for that portionof the prey molecule; such activation domains are preferably weakactivation domains, i.e., weaker than the GAL4 activation region IImoiety and preferably no stronger than B112 (as measured, e.g., by acomparison with GAL4 activation region 11 or B112 in parallelβ-galactosidase assays using lacZ reporter genes); such a domain may,however, be weaker than B112. In particular, the extraordinarysensitivity of the LEU2 selection scheme allows even extremely weakactivation domains to be utilized in the invention. Examples of otheruseful weak activation domains include B17, B42, and the amphipathichelix (AH) domains described in Ma and Ptashne (Cell 51:113, 1987),Ruden et al. (Nature 350:426-430, 1991), and Giniger and Ptashne (Nature330:670, 1987).

The prey proteins, if desired, may include other optional nuclearlocalization sequences (e.g., those derived from the GAL4 or MATα2genes) or other optional epitope tags (e.g., portions of the c-mycprotein or the flag epitope available from Immunex). These sequencesoptimize the efficiency of the system, but are not required for itsoperation. In particular, the nuclear localization sequence optimizesthe efficiency with which prey molecules reach the nuclear-localizedreporter gene construct(s), thus increasing their effectiveconcentration and allowing one to detect weaker protein interactions.The epitope tag merely facilitates a simple immunoassay for fusionprotein expression.

Those skilled in the art will also recognize that the above-describedreporter gene, DNA binding domain, and gene activation domain componentsmay be derived from any appropriate eukaryotic or prokaryotic source,including yeast, mammalian cell, and prokaryotic cell genomes or cDNAsas well as artificial sequences. Moreover, although yeast represents apreferred host organism for the interaction trap system (for reasons ofease of propagation, genetic manipulation, and large scale screening),other host organisms such as mammalian cells may also be utilized. If amammalian system is chosen, a preferred reporter gene is the sensitiveand easily assayed CAT gene; useful DNA binding domains and geneactivation domains may be chosen from those described above (e.g., theLexA DNA binding domain and the B42 or B112 activation domains).

Conformation-Constraining Proteins

According to one embodiment of the present invention, the DNA sequenceencoding the prey protein is embedded in a DNA sequence encoding aconformation-constraining protein (i.e., a protein that decreases theflexibility of the amino and carboxy termini of the prey protein).Methods for directly linking the amino and carboxy termini of a protein(e.g., through disulfide bonding of appropriately positioned cysteineresidues) are described above. As an alternative to this approach,conformation-constraining proteins may be utilized. In general,conformation-constraining proteins act as scaffolds or platforms, whichlimit the number of possible three dimensional configurations thepeptide or protein of interest is free to adopt. Preferred examples ofconformation-constraining proteins are thioredoxin or otherthioredoxin-like sequences, but many other proteins are also useful forthis purpose. Preferably, conformation-constraining proteins are smallin size (generally, less than or equal to 200 amino acids), rigid instructure, of known three dimensional configuration, and are able toaccommodate insertions of proteins of interest without undue disruptionof their structures. A key feature of such proteins is the availability,on their solvent exposed surfaces, of locations where peptide insertionscan be made (e.g., the thioredoxin active-site loop). It is alsopreferable that conformation-constraining protein producing genes behighly expressible in various prokaryotic and eukaryotic hosts, or insuitable cell-free systems, and that the proteins be soluble andresistant to protease degradation. Examples of conformation-constrainingproteins useful in the invention include nucleases (e.g., RNase A),proteases (e.g., trypsin), protease inhibitors (e.g., bovine pancreatictrypsin inhibitor), antibodies or rigid fragments thereof, conotoxins,and the pleckstrin homology domain. This list, however, is not limiting.It is expected that other conformation-constraining proteins havingsequences not identified above, or perhaps not yet identified orpublished, may be useful based upon their structural stability andrigidity.

As mentioned above, one preferred conformation-constraining proteinaccording to the invention is thioredoxin or other thioredoxin-likeproteins. As one example of a thioredoxin-like protein useful in thisinvention, E. coli thioredoxin has the following characteristics. E.coli thioredoxin is a small protein, only 11.7 kD, and can be producedto high levels. The small size and capacity for high level synthesis ofthe protein contributes to a high intracellular concentration. E. colithioredoxin is further characterized by a very stable, tight tertiarystructure which can facilitate protein purification.

The three dimensional structure of E. coli thioredoxin is known andcontains several surface loops, including a distinctive Cys . . . Cysactive-site loop between residues CyS33 and CyS36 which protrudes fromthe body of the protein. This Cys . . . Cys active-site loop is anidentifiable, accessible surface loop region and is not involved ininteractions with the rest of the protein which contribute to overallstructural stability. It is therefore a good candidate as a site forprey protein insertions. Human thioredoxin, glutaredoxin, and otherthioredoxin-like molecules also contain this Cys . . . Cys active-siteloop. Both the amino- and carboxyl-termini of E. coli thioredoxin are onthe surface of the protein and are also readily accessible for fusionconstruction. E. coli thioredoxin is also stable to proteases, stable inheat up to 80° C. and stable to low pH.

Other thioredoxin-like proteins encoded by thioredoxin-like DNAsequences useful in this invention share homologous amino acidsequences, and similar physical and structural characteristics. Thus,DNA sequences encoding other thioredoxin-like proteins may be used inplace of E. coli thioredoxin according to this invention. For example,the DNA sequence encoding other species' thioredoxin, e.g., humanthioredoxin, are suitable. Human thioredoxin has a three-dimensionalstructure that is virtually superimposable on E. coli'sthree-dimensional structure, as determined by comparing the NMRstructures of the two molecules. Forman-Kay et al., Biochem. 30:2685(1991). Human thioredoxin also contains an active-site loop structurallyand functionally equivalent to the Cys . . . Cys active-site loop foundin the E. coli protein. It can be used in place of or in addition to E.coli thioredoxin in the production of protein and small peptides inaccordance with the method of this invention. Insertions into the humanthioredoxin active-site loop and onto the amino terminus may be aswell-tolerated as those in E coli thioredoxin.

Other thioredoxin-like sequences which may be employed in this inventioninclude all or portions of the proteins glutaredoxin and variousspecies' homologs thereof (Holmgren, supra). Although E. coliglutaredoxin and E. coli thioredoxin share less than 20% amino acidhomology, the two proteins do have conformational and functionalsimilarities (Eklund et al., EMBO J. 3:1443-1449 (1984)) andglutaredoxin contains an active-site loop structurally and functionallyequivalent to the Cys . . . Cys active-site loop of E. coli thioredoxin.Glutaredoxin is therefore a thioredoxin-like molecule as defined herein.

In addition, the DNA sequence encoding protein disulfide isomerase(PDI), or that portion containing the thioredoxin-like domain, and itsvarious species' homologs thereof (Edman et al., Nature 317:267-270(1985)) may also be employed as a thioredoxin-like DNA sequence, since arepeated domain of PDI shares >30% homology with E. coli thioredoxin andthat repeated domain contains an active-site loop structurally andfunctionally equivalent to the Cys . . . Cys active-site loop of E. colithioredoxin. The two latter publications are incorporated herein byreference for the purpose of providing information on glutaredoxin andPDI which is known and available to one of skill in the art.

Similarly the DNA sequence encoding phosphoinositide-specificphospholipase C (PI-PLC), fragments thereof, and various species'homologs thereof (Bennett et al., Nature, 334:268-270 (1988)) may alsobe employed in the present invention as a thioredoxin-like sequencebased on the amino acid sequence homology with E. coli thioredoxin, oralternatively based on similarity in three dimensional conformation andthe presence of an active-site loop structurally and functionallyequivalent to Cys . . . Cys active-site loop of E. coli thioredoxin. Allor a portion of the DNA sequence encoding an endoplasmic reticulumprotein, ERp72, or various species homologs thereof are also included asthioredoxin-like DNA sequences for the purposes of this invention(Mazzarella et al., J. Biol. Chem. 265:1094-1101 (1990)) based on aminoacid sequence homology, or alternatively based on similarity in threedimensional conformation and the presence of an active-site loopstructurally and functionally equivalent to Cys . . . Cys active-siteloop of E. coli thioredoxin. Another thioredoxin-like sequence is a DNAsequence which encodes all or a portion of an adult T-cellleukemia-derived factor (ADF) or other species homologs thereof(Wakasugi et al., Proc. Natl. Acad. Sci. USA, 87:8282-8286 (1990)). ADFis now believed to be human thioredoxin. Similarly, the proteinresponsible for promoting disulfide bond formation in the periplasm ofE. coli, the product of the dsbA gene (Bardwell et al., Cell 67:581-89,1991) also can be considered a thioredoxin-like sequence. The threelatter publications are incorporated herein by reference for the purposeof providing information on PI-PLC, ERp72, ADF, and dsbA which are knownand available to one of skill in the art.

It is expected from the definition of thioredoxin-like sequences usedabove that other sequences not specifically identified above, or perhapsnot yet identified or published, may be useful as thioredoxin-likesequences based on their amino acid sequence homology to E. colithioredoxin or based on having three dimensional structuressubstantially similar to E. coli or human thioredoxin and having anactive-site loop functionally and structurally equivalent to the Cys . .. Cys active-site loop of E. coli thioredoxin. One skilled in the artcan determine whether a molecule has these latter two characteristics bycomparing its three-dimensional structure, as analyzed for example byx-ray crystallography or two-dimensional NMR spectroscopy, with thepublished three-dimensional structure for E. coli thioredoxin and byanalyzing the amino acid sequence of the molecule to determine whetherit contains an active-site loop that is structurally and functionallyequivalent to the Cys . . . Cys active-site loop of E. coli thioredoxin.By “substantially similar” in three-dimensional structure orconformation is meant as similar to E. coli thioredoxin as isglutaredoxin. In addition a predictive algorithm has been describedwhich enables the identification of thioredoxin-like proteins viacomputer-assisted analysis of primary sequence (Ellis et al.,Biochemistry 31:4882-91 (1992)). Based on the above description, one ofskill in the art will be able to select and identify, or, if desired,modify, a thioredoxin-like DNA sequence for use in this inventionwithout resort to undue experimentation. For example, simple pointmutations made to portions of native thioredoxin or nativethioredoxin-like sequences which do not effect the structure of theresulting molecule are alternative thioredoxin-like sequences, as areallelic variants of native thioredoxin or native thioredoxin-likesequences.

DNA sequences which hybridize to the sequence for E. coli thioredoxin orits structural homologs under either stringent or relaxed hybridizationconditions also encode thioredoxin-like proteins for use in thisinvention. An example of one such stringent hybridization condition ishybridization at 4×SSC at 65° C., followed by a washing in 0.1×SSC at65° C. for an hour. Alternatively an exemplary stringent hybridizationcondition is in 50% formamide, 4×SSC at 42° C. Examples of non-stringenthybridization conditions are 4×SSC at 50° C. or hybridization with30-40% formamide at 42° C. The use of all such thioredoxin-likesequences are believed to be encompassed in this invention.

It may be preferred for a variety of reasons that prey proteins be fusedwithin the active-site loop of thioredoxin or thioredoxin-likemolecules. The face of thioredoxin surrounding the active-site loop hasevolved, in keeping with the protein's major function as a nonspecificprotein disulfide oxido-reductase, to be able to interact with a widevariety of protein surfaces. The active-site loop region is foundbetween segments of strong secondary structure and this provides a rigidplatform to which one may tether prey proteins.

A small prey protein inserted into the active-site loop of athioredoxin-like protein is present in a region of the protein which isnot involved in maintaining tertiary structure. Therefore the structureof such a fusion protein is stable. Indeed, E. coli thioredoxin can becleaved into two fragments at a position close to the active-site loop,and yet the tertiary interactions stabilizing the protein remain.

The active-site loop of E. coli thioredoxin has the sequence NH₂ . . .Cys₃₃-Gly-Pro-CyS₃₆ . . . COOH. Fusing a selected prey protein with athioredoxin-like protein in the active loop portion of the proteinconstrains the prey at both ends, reducing the degrees of conformationalfreedom of the prey protein, and consequently reducing the number ofalternative structures taken by the prey. The inserted prey protein isbound at each end by cysteine residues, which may form a disulfidelinkage to each other as they do in native thioredoxin and further limitthe conformational freedom of the inserted prey.

In addition, by being positioned within the active-site loop, the preyprotein is placed on the surface of the thioredoxin-like protein, anadvantage for use in screening for bioactive protein conformations andother assays. In general, the utility of thioredoxin or otherthioredoxin-like proteins is described in McCoy et al., U.S. Pat. No.5,270,181 and LaVallie et al., Bio/Technology 11:187-193 (1993). Thesetwo references are hereby incorporated by reference.

There now follows a description of thioredoxin interaction trap systemsaccording to the invention. These examples are designed to illustrate,not limit, the invention.

Thioredoxin Interaction Trap System

Interaction trap systems utilizing conformationally-constrained proteinshave been developed for the detection of protein interactions, theidentification and isolation of proteins participating in suchinteractions, the identification and isolation of agonists andantagonists of such interactions, and the identification and isolationof interacting peptide aptamers that may be used in protein detectionassays in a manner analogous to antibody-type reagents. Exemplarysystems are now described.

1. Thioredoxin Interaction Trap with Cdk2 Bait

Progression of eukaryotic cells through the cell cycle requires thecoordinated action of a number of regulatory proteins that interact withand regulate the activity of Cdks (Sherr, Cell 79:551-555 (1994)). Thesemodulatory proteins include cyclins, which positively regulate Cdkactivity, Cyclin Dependent kinase inhibitors (Ckis), and a number ofprotein kinases and phosphatases, some of which, such as CAK and Cdc25,positively regulate kinase activity, some of which, such as Weel,inhibit kinase activity, and some of which, such as Cdi1 (Gyuris et al.,Cell 75:791-803 (1993)), have effects that are so far unknown (reviewedin Morgan, Nature 374:131-134 (1995)). Cdk2 is thought to be requiredfor higher eukaryotic cells to progress from G1 into S-phase (Fang &Newport, J. Cell Biol. 66:731-742 (1991); Pagano et al., J. Cell Biol.121:101-111 (1993); van den Heuvel & Harlow, Science 262: 2050-2054(1993)). Cdk2 kinase activity is positively regulated by Cyclin E andCyclin A (Koff et al., Science 257:1689-1694 (1992); Dulic et al.,Science 257:1958-1961 (1992); Tsai et al., Nature 353:174-7 (1991)) andnegatively regulated by p21, p27 and p57 (Harper et al., Cell 75:805-816(1993); Polyak et al., Genes Dev. 8:9-22 (1994); Toyoshima & Hunter,Cell 78:67-74 (1994); Matsuoka et al., Genes Dev. 9:660-662 (1995); Leeet al., Genes Dev. 9:639-649 (1995)); in addition, Cdk2 complexes withCdi1 at the G1 to S transition (Gyuris et al., supra). Here we describethe use of a yeast two-hybrid system to select molecules which recognizeCdk2 from combinatorial libraries.

A prey vector is constructed containing the E. coli thioredoxin gene(trxA). pJG 4-4 (Gyuris et al., supra) is used as the vector backboneand cut with EcoRI and XhoI. A DNA fragment encoding the B112transcription activation domain is obtained by PCR amplification ofplasmid LexA-B112 (Doug Ruden, Ph.D. thesis, Harvard University, 1992)and cut with MunI and NdeI. The E. coli trxA gene is excised from thevector pALTRXA-781 (U.S. Pat. No. 5,292,646; InVitrogen Corp., SanDiego, Calif.) by digestion with NdeI and SalI. The trxA and B112fragments are then ligated by standard techniques into theEcoRI/XhoI-cut pJG 4-4 backbone, forming pYENAeTRX. This vector encodesa fusion protein comprising the SV40 nuclear localization domain, theB112 transcription activation domain, an hemagglutinin epitope tag, andE. coli thioredoxin (FIG. 2).

Peptide libraries are constructed as follows. The DNA oligomer 5′GACTGACTGGTCCG(NNK)₂₀GGTCCTCAGTCAGTCAG 3′ (with N=A, C, G, T and K=G, T)(SEQ ID NO: 4) is synthesized and annealed to the second oligomer (5′CTGACTGACTGAGGACC 3′) (SEQ ID NO: 5) in order to form double strandedDNA at the 3′ end of the first oligomer. The second strand isenzymatically completed using Klenow enzyme, priming synthesis with thesecond oligomer. The product is cleaved with AvaII, and inserted intoRsrII cut pYENAeTRx. After ligation, the construct is used to transformE. coli by standard methods (Ausubel et al., Current Protocols inMolecular Biology, (Greene and Wiley-interscience, New York,1987-1994)). The library contained 2.9×10⁹ members, of which more than10⁹ directed the synthesis of peptides. Twenty-mers were chosen aspreferred peptides because they were long enough to fold into manydifferent patterns of shape and charge and short enough that many of theencoding oligonucleotides lacked stop codons. Because of the presence offortuitous restriction sites in some coding oligonucleotides and becausesome library members contained double inserts, approximately one fifthof the constrained peptides were longer or shorter than unit length.

To screen for interacting peptides or “aptamers,” 100 μg of the librarywas used to transform the yeast strain EGY48 (Matα his3leu2::2Lexop-LEU2 ura3 trp1 LYS2; Gyuris et al., supra). This strainalso contained the reporter plasmid pSH 18-34, a pLR1Δ1 derivative,containing the yeast 2μ replication origin, the URA3 gene, and aGAL1-lacZ reporter gene with the GAL1 upstream regulatory elementsreplaced with 4 colE1 LexA operators (West et al., Mol. Cell Biol.4:2467, 1984; Ebina et al., J. Biol. Chem. 258:13258, 1983; Hanes andBrent, Cell 57:1275, 1989), as well as the bait vector pLexA202-Cdk2(Cdk2 encodes the human cyclin dependent kinase 2, an essential cellcycle enzyme) (Gyuris et al., supra; Tsai et al., Oncogene 8:1593,1993). About 2.5×10⁶ transformants are obtained and pooled. The firstselection step, growth on leucine-deficient medium after induction with2% galactose/1% raffinose (Gyuris et al., supra; Guthrie and Fink, Guideto Yeast Genetics and Molecular Biology, Vol. 194, 1991), was performedwith an 8-fold redundancy (20×10⁶ cfu) of the library in yeast, andabout 900 colonies were obtained after growth at 30° C. for 5 days. The300 largest colonies were streak purified and tested for thegalactose-dependent expression of the LEU2 gene product and ofβ-galactosidase (encoded by pSH 18-34), the latter giving rise to blueyeast colonies in the presence of Xgal in the medium (Ausubel et al.,supra). Thirty-three colonies fulfilled these requirements which, aftersequencing, included 14 different clones, all of which boundspecifically to a LexA-Cdk2 bait but not to LexA or to a LexA-Cdk3 bait(Finley et al., Proc. Natl. Acad. Sci. USA 91:12980-12984 (1994)). Thestrength of binding was judged according to the intensity of the bluecolor formed by a colony of the yeast that contained each differentinteractor. By this means, each interactor was classified as a strong,medium, or weak binder, which was normalized to the amount of blue colorcaused by the various naturally-occurring partner proteins of Cdk2 inside by side mating interaction assays. An example of the peptidesequence of one representative of each class is given here:

Strong binder: peptide 3 (SEQ ID NO: 6)-Gly₃₄-Pro₃₅-Leu-Val-Cys-Lys-Ser-Tyr-Arg-Leu-Asp-Trp-Glu-Ala-Gly-Ala-Leu-Phe-Arg-Ser-Leu-Phe-Gly₃₄- Pro₃₅- Medium binder:peptide 2 (SEQ ID NO: 7)-Gly₃₄-Pro₃₅-Met-Val-Val-Ala-Ala-Glu-Ala-Val-Arg-Thr-Val-Leu-Leu-Ala-Asp-Gly-Gly-Asp-Val-Thr-Gly₃₄- Pro₃₅- Weak binder:peptide 6 (SEQ ID NO: 8)-Gly₃₄-Pro₃₅-Pro-Asn-Trp-Pro-His-Gln-Leu-Arg-Val-Gly-Arg-Val-Leu-Trp-Glu-Arg-Leu-Ser-Phe-Glu-Gly₃₄- Pro₃₅- Controlpeptides which do not bind detectably are: (SEQ ID NO: 14) c4:Arg-Arg-Ala-Ser-Val-Cys-Gly-Pro-Leu-Leu-Ser- Lys-Arg-Gly-Tyr-GlyPro-Pro-Phe-Tyr-Leu-Ala-Gly-Met-Thr-Ala-Pro-Glu- Gly-Pro-Cys and (SEQ IDNO: 15) c: Arg-Arg-Ala-Ser-Val-Cys-Gly-Pro-Leu-His-Tyr-Trp-Gly-Leu-Gly-Gly-Phe-Val-Asp-Leu-Trp-Gln-Glu-Thr-Thr-Gly-Val-Gly-Pro-Cys.

FIG. 3A shows that 5 of the peptide aptamers reacted strongly with theLexA-Cdk2 bait but not with a large number of unrelated proteins. Noneof the Cdk2 aptamers interacted with CDC28 or Cdc2, which are both 65%identical to Cdk2. However, 2 of the 5 Cdk2 interactors also interactedwith human Cdk3, and 1 of the 5 also interacted with Drosophila Cdc2c,suggesting that these peptides recognize determinants common to theseproteins. Both theoretical considerations and calibration experimentswith lambda repressor's C terminus suggested that transcription of thepSH18-34 reporter in EGY48 can be activated by protein interactions withKds as weak as 10⁻⁶M. The fact that peptides 3 and 13 directed robusttranscription of the this LexAop-lacZ reporter was consistent with theidea that they may interact significantly more tightly. The sequence ofthese peptides is shown in FIG. 3B.

In related experiments, 6 additional aptamers (i.e., pep6 (SEQ ID NO:21), pep7 (SEQ ID NO: 22), pep9 (SEQ ID NO: 23), pep12 (SEQ ID NO: 24),pep13 (SEQ ID NO: 25), and pep14 (SEQ ID NO: 26) were shown to interactwith the LexA-Cdk2 bait but not with unrelated proteins such as Max orRb, or with certain Cdk family members such as Cdk4, which shares 47%sequence identity with Cdk2 (FIG. 4A). However, some aptamers interactedwith other Cdk family members. The fact that different peptide aptamersshowed distinct patterns of cross-reactivity with different Cdksindicated that these aptamers recognized different epitopes conservedamong various Cdks. The sequence of the peptide loops is shown in FIG.4B. Non-unit-length peptides occurred at the same frequency among theCdk2 interacting aptamers as in the library as a whole. No aptamershowed significant sequence similarity to known proteins, as expected ifthe 20-mer peptides indeed formed novel recognition structures. All ofthe peptides were charged, suggesting that some of their interactionswith the Cdk2 target could be ionic.

To confirm the specificity of the Cdk2 interaction, a Gst-Cdk2 fusionprotein was immobilized on glutathione sepharose beads, and these beadswere used to specifically precipitate bacterially expressed peptideaptamers. One set of results is shown in FIG. 5, and another set in FIG.6.

For the FIG. 5 results, Gst-Cdk2 was expressed in E. coli and purifiedon glutathione sepharose as previously described (Lee et al., Nature374:91-94 (1995)). The peptides were generated as follows: fragmentsthat directed the synthesis of peptides 3 and 13 were made by PCRamplification of the insert encoded by the corresponding library plasmidand introduced into pAL-TrxA (LaVallie et al., supra). Fusion proteinswere expressed and lysed in a French pressure cell as previouslydescribed (LaVallie et al., supra). Coprecipitation was carried outusing Gst-Sepharose beads as described in Lee et al. (supra), andsamples were run on 15% SDS polyacrylamide gels and transferred to nylonmembranes. TrxA-containing fusion proteins were visualized by probingthe membranes with an anti-TrxA antibody, followed by treatment of theimmobilized antibody with peroxidase-coupled anti-rabbit IgG antibodyECL reagents according to the manufacturer's instructions (Amersham,Arlington Heights, Ill.).

For the FIG. 6 results, Gst and Gst-Cdk2 were purified as described (Leeet al., Nature 374:91-94 (1995)). pALHISTRX was constructed by annealingthe oligonucleotides 5′TAATGAGCGATAAACACCACCACCACCACCACGACGACGACGACAAAGG3′ (SEQ ID NO: 27) and5′ TACCTTTGTCGCTGTCGTCGTGGTGGTGGTGGTGGTGTTTATCGCTCATTA3′ (SEQ ID NO:28), and ligating into NdeI-cut pALTRX-781 (LaVallie et al., supra).AvaII fragments encoding peptide loops were then cloned from the libraryplasmids into RsrII-cut pALHISTRX. His6-TrxA and His6-aptamers wereexpressed in G1724 as previously described (Ausubel et al., supra), theproteins were purified on Ni²⁺-NTA-Agarose according to manufacturer'sdirections (Qiagen, Chatsworth, Calif.), and then dialyzed against 10 mMHepes pH 7.4/50 mM NaCl. 1 μg of His6-TrxA or His6-aptamers wasprecipitated with Gst or Gst-Cdk2 sepharose beads as described (Lee etal., supra), and the products detected by Western blot analysis with ananti-TrxA rabbit antiserum and ECL reagents (Amersham, ArlingtonHeights, Ill.).

The results shown in FIGS. 5 and 6 demonstrated that the interactionsbetween Cdk2 and the peptide aptamers could be observed in vitro, andwere thus independent of any bridge proteins native to yeast.

To determine the binding affinities of these aptamers for Cdk2, thefollowing experiments were carried out. Based on interpolation frominteraction trap calibration experiments (Estojak et al., Mol. Cell.Biol. 15:5820-5829 (1995)), the robust transcription that some of theaptamers of FIGS. 4A and 4B directed from the pSH18-34 reportersuggested that the equilibrium dissociation constants (Kds) of theinteractions was <10⁻⁶M. In order to precisely measure the bindingaffinity of the aptamers to Cdk2, we used an evanescent wave instrument(BIAcore, Pharmacia, Piscataway, N.J.). Purified His6-Cdk2 was coupledto CM-dextran chips, and peptide aptamers flowed in running buffer overthe chips. Following binding, the chips were rinsed with running bufferlacking aptamer.

In particular, in these experiments, HIS6-Cdk2 was cross-linked in 10 mMMES pH 6.1/50 mM NaCl to CM5 chips with an amine-coupling kit(Pharmacia, Piscataway, N.J.). Purified aptamers were then flowed inrunning buffer (Hepes 10 mM pH 7.4/50 mM NaCl) onto the chips at 5μl/minute, and association and dissociation of the His6-Cdk2-aptamercomplexes recorded as variations in resonance angle with time.Association phase started upon aptamer injection, and dissociation phaseupon running buffer injection. Portions of association and dissociationcurves were then fitted that excluded the sudden variations in resonanceangle caused by transitions between running buffer andaptamer-containing running buffer, which differed slightly in refractiveindex (“buffer fluxes”).

Association and dissociation rate constants were determined by fittingthe association and dissociation phases of at least two runs (andtypically four runs) for each aptamer to exponential functions using thedata analysis Program IGOR (Wavemetrics, Inc., Lake Oswego, Oreg.) and anon-linear least squares algorithm as described in O'Shannessy et al.(Anal. Biochem. 212:457-468 (1993). Kds were calculated by dividingdissociation rate constants by association rate constants. Arepresentative wave instrument run is shown in FIG. 7, and Table 1indicates that, under the conditions described above, all aptamersexhibited Kds between 30 and 120 nM.

TABLE 1 Dissociation rate Association rate Aptamer constant × 10⁻⁶ (s⁻¹)constant (M⁻¹s⁻¹) Kd (nM) Pep 2  480 +/− 109 7474 +/− 270  64 +/− 12 Pep3 246 +/− 20 2201 +/− 160 112 +/− 1  Pep 5 428 +/− 16 8263 +/− 215 52+/− 1 Pep 8 120 +/− 15 3122 +/− 23  38 +/− 5 Pep 10 693 +/− 64 6555 +/−28  105 +/− 10 Pep 11 484 +/− 25 5590 +/− 168 87 +/− 7

The ability to select TrxA-peptides that interact specifically withdesignated intracellular baits allows for the creation of other classesof intracellular reagents. For example, appropriately derivitizedTrxA-peptide fusions may allow the creation of antagonists or agonists(as described above). Alternatively, peptide fusions allow for thecreation of homodimeric or heterodimeric “matchmakers,” which force theinteraction of particular protein pairs. In one particular example, twoproteins are forced together by utilizing a leucine zipper sequenceattached to a conformation-constraining protein containing a candidateinteraction peptide. This protein can bind to both members of a proteinpair of interest and direct their interaction. Alternatively, the“matchmaker” may include two different sequences, one having affinityfor a first polypeptide and the second having affinity for the secondpolypeptide; again, the result is directed interaction between the firstand second polypeptides. Another practical application for the peptidefusions described herein is the creation of “destroyers,” which target abound protein for destruction by host proteases. In an example of thedestroyer application, a protease is fused to one component of aninteracting pair and that component is allowed to interact with thetarget to be destroyed (e.g., a protease substrate). By this method, theprotease is delivered to its desired site of action and its proteolyticpotential effectively enhanced. Yet another application of the fusionproteins described herein are as “conformational stabilizers,” whichinduce target proteins to favor a particular conformation or stabilizethat conformation. In one particular example, the ras protein has oneconformation that signals a cell to divide and another conformation thatsignals a cell not to divide. By selecting a peptide or protein thatstabilizes the desired conformation, one can influence whether a cellwill divide. Other proteins that undergo conformational changes whichincrease or decrease activity can also be bound to an appropriate“conformational stabilizer” to influence the property of the desiredprotein.

2. Functional Inhibition of Cdk2

To determine whether Cdk2 interacting peptides might inhibit Cdk2function in vivo, we took advantage of the fact that human Cdk2 cancomplement temperature sensitive alleles of Cdc28 (Elledge andSpottswood, EMBO 10:2653-2659, 1991; Ninomiya et al., PNAS 88:9006-9010,1991; Meyerson et al., EMBO 11:2909-2917, 1992). Peptide 13 inhibits theplating efficiency of a Cdk2-dependent yeast. A strain carrying thetemperature sensitive cdc28-1N mutation can form colonies at hightemperature if it carries a plasmid that expresses Cdk2. At therestrictive temperature, compared to the plating efficiency of yeastexpressing control peptides, expression of peptide 13 diminishes theplating efficiency of this strain by 10-fold. Both peptide 3 and 13 havesimilar effects on the plating efficiency at 37° C. of a Cdk2(+) strainthat carries the cdc28-13ts allele.

Expression of peptide 13 slows the doubling time of a Cdk2(+),cdc28ts-1N strain by a factor of 50%. Microscopic examination of strainsexpressing the peptide revealed that a high proportion of these cellshad an elongated morphology characteristic of cdc28-1N cells at therestrictive temperature, whereas cells expressing a control peptide hada more normal morphology.

Peptide 13 does not affect the growth of a cdc28-1Nts strain at hightemperature when the defect is complemented by a plasmid expressingwild-type Cdc28 product, and has no effect on yeast at the permissivetemperature. While we do not intend to be bound by any particulartheory, it appears that this peptide blocks yeast cell cycle progressionby binding to some face of the Cdk2 molecule and inhibiting its functionand thereby interfering with its ability to interact with cyclins, otherpartners, or with substrates.

In later experiments with the aptamers of FIG. 4B, inhibition of Cdk2activity by these peptides (for example, by binding to a face of themolecule and by blocking its interaction with one of its partnerproteins or substrates) was examined. In particular, the ability of theaptamers to inhibit phosphorylation of Histone H1 by Cdk2/Cyclin Ekinase was tested. To carry out these experiments, 2×10⁷ Sf9 cells wereco-infected with recombinant bacculoviruses expressinghemagglutinin-tagged Cdk2 and His6-Cyclin E as described (Kato et al.,Genes & Dev. 7:331-342 (1993); Desai et al., Mol. Biol. Cell 3:571-582(1992)). Cells were lysed 40 hours after infection in 500 of 1× KinaseBuffer (Kato et al., supra), and 5 μl of a 100-fold diluted extract wasused in 30 μl reactions. Reactions were carried out for 20 minutes at25° C. by adding 2.5 μCi of [γ32p] ATP (3000 Ci/mmol), 25 μM ATP, 100 ngof Histone H1 (Sigma, St. Louis, Mo.), and varying amounts of His6-TrxAor His6-aptamers. Samples were run on 15% SDS-PAGE gels and exposed byautoradiography.

The results of these experiments are shown in FIG. 8. All testedaptamers were able to inhibit phosphorylation of Histone H1 byCdk2/Cyclin E kinase. Under standard conditions (pH 7.5, 0 mM NaCl)(Kato et al., supra), apparent half-inhibitory concentrations rangedfrom 1.5 to 100 nM. To rule out the possibility that a trace bacterialcontaminant was responsible for the inhibition, we removed theHis6-peptide aptamer from the Pep2 preparation with a rabbit polyclonalanti-thioredoxin antiserum; this immunodepleted preparation no longerinhibited Cdk2 kinase activity. Half-inhibitory concentrations ofaptamers were lower than the Kds measured from evanescent waveexperiments, consistent with the idea that some of the energy of eachinteraction is ionic and is reduced by the salt in the evanescent waveinstrument running buffer.

In co-precipitation experiments (Reymond et al., Oncogene 11:1173-1178(1995)), purified Pep2 did not compete with in vitro-translated Cyclin Efor binding to in vitro-translated Cdk2. However, inhibition by Pep2 wasreversed by addition of a 10-fold excess of Histone H1, suggesting thatat least Pep2 inhibits kinase activity by competing with its H1substrate.

Previous studies have established that libraries of unconstrainedpeptides contain sequences capable of recognizing targets in vitro(Devlin et al., Science 249:404-406 (1990); Cwirla et al., Proc. Natl.Acad. Sci. USA 87:6378-6382 (1990); Lam et al., Nature 354:82-84 (1991);Songyang et al., Current Biology 4:973-982 (1994); Scott et al., CurrentBiology 5:40-48 (1994)) and in yeast (Yang et al., Nucl. Acids. Res.23:1152-1156 (1995)); such isolated peptide sequences often bearsimilarity to natural interactors. By contrast, although constrainedpeptide libraries are less conformationally diverse (McConnell et al.,Gene 151:115-118 (1994)), the lack of conformational diversity shouldlower the entropic cost if binding causes the loop to adopt a singleconformation (Spolar et al., Science 263:777-784 (1994)); this reductionin entropic cost may account for the fact that our Cdk2 peptide aptamersrecognize their targets with higher affinity than is typically observedfor unconstrained peptides (Yang et al., supra; Oldenburg et al., Proc.Natl. Acad. Sci. USA 89:5393-5397 (1992); McLafferty et al., Gene128:29-36 (1993)). This high affinity suggests that peptide aptamers mayinhibit protein function in vivo, in the simplest case by binding tospecific faces of the target molecule and disrupting its interactionwith specific partners or effectors.

The ability to generate large numbers of aptamers from combinatoriallibraries, taken together with the interaction trap, which offers apowerful selection for those that bind specific proteins, facilitatesthe selection of peptide aptamers against a variety of intracellulartargets. Aptamers which inhibit protein contacts can be used to aid thedissection of the networks of protein interactions that govern divisionof higher eukaryotic cells and can also be used for the genetic analysisof those metazoan organisms for which isolation of specific missensealleles may be impractical. The analogy of the aptamers of the inventionwith antibodies indicates that peptide aptamers can also be used inother applications in which immunological reagents are now employed,such as ELISAs, immunofluorescence experiments, and sensors. If desired,the affinity of these aptamers may be increased, for example, byincreasing their valency and using existing interaction technology toselect mutants that bind more tightly. This first generation of peptideaptamers facilitates the production of recognition modules forintracellular nanotechnologies aimed at destroying, modifying, andassembling macromolecules inside cells.

3. Thioredoxin Interaction Tray with OncoRas Bait

The ras proteins are essential for many signal transduction pathways andregulate numerous physiological functions including cell proliferation.The ras genes were first identified from the genome of Harvey andKirsten sarcoma virus. The three types of mammalian ras genes (N—,K-ras, and H-ras) encode highly conserved membrane-bound guaninenucleotide binding proteins with a molecular mass of 21 kDa, which cyclebetween the active (GTP-bound) form and the inactive (GDP-bound) form.

In normal cells, the active form of Ras is short-lived, as its intrinsicGTPase activity rapidly converts the bound-GTP to GDP. The GTPaseactivity is stimulated 105-fold by GTPase-activating proteins (GAPs).GTP-bound Ras interact with GAP, c-Raf, neurofibromatosis type 1 (NF-1)and Ral guanine nucleotide dissociation stimulator (RalGDS).

Mutationally-activated RAS proteins are found in about 30% of humantumor cells and have greatly decreased GTPase activity which can not bestimulated by GAPs. The majority of mutations studied thus far are dueto a point mutation at either residue Gly-12 or residue Gln-61 of Ras.These Ras mutants remain in the active form and interact with thedownstream effectors to result in tumorigenesis. It has been shown thatthere are significant conformational differences between GTP-bound formsof wild-type and oncogenic RAS proteins. Such conformational differencesare likely causes for malignant transformation induced by oncogenic rasproteins.

Such mutationally-activated conformational changes in GTP-bound H-rasmutants provide targets for members of a conformationally constrainedrandom peptide library. In the present example, the library is aconformationally constrained thioredoxin peptide library, as describedabove. Library members, which interact with oncogenic Ras have beenidentified using a variation of the interaction trap technology providedabove. The oncogenic Ras peptide aptamers isolated may be assayed fortheir ability to disrupt the interaction of oncogenic Ras with knowneffectors and to inhibit cellular transformation.

We have used well-characterized oncogenic H-ras(G12V) for isolation andcharacterization of its peptide aptamers. Peptide aptamers for otheroncogenes can be isolated using adaptations of this protocol as providedherein.

Bait Construction

Construction of LexA-Ras(G12V)/pEG202:H-Ras(G12V) DNA was performed bydigesting BTM116-H-Ras(G12V) (FIG. 9) with BamHI and SalI. H-Ras(G12V)DNA was ligated with pEG202 backbone digested with BamHI and SalI. Theresulting plasmid was called pEG202-H-Ras(G12V) (or V6) (FIG. 10).

Screening for H-Ras(G12V) Peptide Aptamers

pEG202-H-Ras(G12V) (V6) was transformed into the EGY48 strain accordingto a standard yeast transformation protocol; in particular, the protocolprovided by Zymo Research (Orange County, CA) was used here. EGY48 wasgrown in YPD medium to OD₆₀₀=0.2-0.7. Cells were pelleted at 500×g for 4min. and resuspended in 10 ml of EZ1 solution (Zymo Research). The cellswere then pelleted by centrifugation and resuspended in 1 ml of EZ2(Zymo Research). Aliquots of competent cells (50 μl) were stored in a−70° C. freezer.

An aliquot of competent cells was mixed with 0.1 μg ofLexA-H-Ras(G12V)/pEG202 and 500 μl of EZ3 solution (Zymo Research). Themixture was incubated at 30° C. for 30 min. and plated onto a yeastmedium lacking histidine and uracil. One colony was picked andinoculated into 100 ml of glucose Ura⁻His⁻ medium at 30° C. with shaking(150 rpm) until the OD₆₀₀ measurement was 0.96. The culture wascentrifuged at 2000 g for 5 min and cell pellets were resuspended in 5ml of sterile LiOAc/TE. The cells were again centrifuged as above andresuspended in 0.5 ml of sterile LiOAc/TE.

Aliquots (50 μl) of the cells were then incubated at 30° C. for 30 min.with 1 μg of thioredoxin peptide library DNA, 70 μg of salmon sperm DNA,and 300 μl of sterile 40% PEG 4000 in LiOAc/TE. The mixtures wereheat-shocked at 42° C. for 15 min. Each aliquot was plated onto a 24cm×24 cm plate containing glucose Ura⁻His⁻Trp⁻ medium and was incubatedat 30° C. for two days. The transforming efficiency typically rangedfrom 50,000 to 100,000 colony forming units per μg of library DNA.

A total of 1.5 million transformants were obtained and were plated ontothe selection medium of galactose/raffinose Leu⁻Ura⁻His⁻Trp⁻. Of the 338colonies formed, among them 50 were randomly picked and inoculated into5 ml of glucose Leu⁻Ura⁻His⁻Trp⁻ medium for preparation of yeast plasmidDNA. A half ml of each yeast culture was mixed with an equal volume ofacid-washed sand and phenol/chloroform/isoamyl alcohol (24:24:1), andvortexed in a vortexer for 2 min. The mixture was then centrifuged for15 min., and the supernatant was precipitated with ethanol. DNA pelletswere resuspended in 50 μl of TE.

One μl of each sample was used to transform E. coli KC8 cells byelectroporation. Bacterial transformants were selected on minimal agarsupplemented with uracil, leucine, histidine, and ampicillin. Each typetransformant resulted in final isolation of plasmid which a leucinemarker, which carries a DNA fragment encoding thioredoxin-peptide fusionprotein.

Sequence determination of the 50 isolates was carried out according tothe directions of the fmolDNA™ sequencing systems (Promega, Madison,Wis.) using primer 5′-GACGGGGCGATCCTCGTCG-3′ (SEQ ID NO:16). Nine out of50 isolates (referred to as #4, #18, 139, #41, #22, #24, #30, #31, 146)contained unique peptide encoding sequences, as determined byelectrophoresis of the dT/ddT termination reaction. Among them, thepredicted peptide aptamer sequence of #39 is as follows:Trp-Ala-Glu-Trp-Cys-Gly-Pro-Val-Cys-Ala-His-Gly-Ser-Arg-Ser-Leu-Thr-Leu-Leu-Thr-Lys-Tyr-His-Val-Ser-Phe-Leu-Gly-Pro-Cys-Lys-Met-Ile-Ala-Pro-Ile-Leu-Asp(SEQ ID NO:17). From our results, it appears that approximately 60unique H-Ras(G12V) peptide aptamers (338×9/50) were isolated in thefirst round of screening.

Other Embodiments

As described above, the invention features a method for detecting andanalyzing protein-protein interactions. Typically, in the aboveexperiments, the bait protein is fused to the DNA binding domain, andthe prey protein (in association with the conformation-constrainingprotein) is fused to the gene activation domain. The invention, however,is readily adapted to other formats. For example, the invention alsoincludes a “reverse” interaction trap in which the bait protein is fusedto a gene activation domain, and the prey protein (in association with aconformation-constraining protein) is fused to the DNA binding domain.Again, an interaction between the bait and prey proteins results inactivation of reporter gene expression. Such a “reverse” interactiontrap system, however, depends upon the use of prey proteins which do notthemselves activate downstream gene expression.

The protein interaction assays described herein can also be accomplishedin a cell-free, in vitro system. Such a system may begin with a DNAconstruct including a reporter gene operably linked to aDNA-binding-protein recognition site (e.g., a LexA binding site). Tothis DNA is added a bait protein (e.g., any of the bait proteinsdescribed herein bound to a LexA DNA binding domain) and a prey protein(e.g., one of a library of conformationally-constrained candidateinteractor prey proteins bound to a gene activation domain). Interactionbetween the bait and prey protein is assayed by measuring the reportergene product, either as an RNA product, as an in vitro translatedprotein product, or by some enzymatic activity of the translatedreporter gene product. Alternatively, interactions involvingconformationally constrained proteins may be carried out by direct invitro techniques, for example, by any standard physical or biochemicaltechnique for identifying protein interactions (such as immobilizationof a first protein on a column or other solid support and contact with aconformationally-constrained protein). These direct in vitro approachesare preferably carried out in such a way that the DNA encoding theconfromationally-constrained protein may be readily isolated, forexample, by using techniques involving phage display or display of theprotein on the E. coli flagella.

These in vitro systems may also be used to identify agonists orantagonists, simply by adding to a known pair of interacting proteins(in the above described system) a candidate agonist or antagonistinteractor and assaying for an increase or decrease (respectively) inreporter gene expression, as compared to a control reaction lacking thecandidate compound or protein. To facilitate large scale screening,candidate prey proteins or candidate agonists or antagonists may beinitially tested in pools, for example, of ten or twenty candidatecompounds or proteins. From pools demonstrating a positive result, theparticular interacting protein or agonist or antagonist is thenidentified by individually assaying the components of the pool. Such invitro systems are amenable to robotic automation or to the production ofkits. Kits including the components of any of the interaction trapsystems described herein are also included in the invention.

In one particular embodiment, interacting proteins identified in vitroare tested for their ability to interact in vivo. Such in vivointeracting proteins may be used for any diagnostic or therapeuticpurpose. For example, proteins shown to interact in vivo may be used todisrupt, encourage, or stablize intracellular interactions or may beused as an intracellular antibody-type reagent.

The components (e.g., the various fusion proteins or DNA therefor) ofany of the in vivo or in vitro systems of the invention may be providedsequentially or simultaneously depending on the desired experimentaldesign.

Other embodiments are within the following claims.

1. A method for identifying an antagonist or agonist protein, saidmethod comprising: (a) providing a host cell which contains a pair ofinteracting proteins, and a reporter gene whose expression is mediatedby the pair of interacting proteins; (b) introducing into said host cella DNA coding for a candidate agonist or antagonist protein, wherein saidcandidate agonist or antagonist protein has reduced structuralflexibility due to covalent bonding of the amino and carboxy termini ofsaid agonist or antagonist protein to a conformation-constrainingprotein or disulfide bonding between cysteine residues at the amino andcarboxy termini of said agonist or antagonist protein; and (c) measuringexpression of said reporter gene, wherein an increase in expression ofsaid reporter gene identifies said candidate protein as an agonistprotein and a decrease in expression of said reporter gene identifiessaid candidate protein as an antagonist protein.
 2. The method accordingto claim 1, wherein a change in expression of the reporter gene affectscell viability or is detectable in a color assay leading to the presenceor absence of color.
 3. The method of claim 1, wherein said methodfurther comprises the step of isolating said agonist or antagonistprotein.
 4. A method for identifying an antagonist or agonist protein,said method comprising: (a) providing a pair of interacting proteins,and a reporter gene whose expression is mediated by the pair ofinteracting proteins; (b) adding to said pair of interacting proteins acandidate agonist or antagonist protein, wherein said candidate agonistor antagonist protein has reduced structural flexibility due to covalentbonding of the amino and carboxy termini of said agonist or antagonistprotein to a conformation-constraining protein or disulfide bondingbetween cysteine residues at the amino and carboxy termini of saidagonist or antagonist protein; and (c) assaying for an increase ordecrease in reporter gene expression, wherein an increase in reportergene expression identifies said candidate protein as an agonist proteinand a decrease in reporter gene expression identifies said candidateprotein as an antagonist protein.
 5. The method of claim 4, wherein saidmethod further comprises the step of isolating said agonist orantagonist protein.
 6. A method for the identification, in anintracellular screen, of peptides that affect an identifiablecharacteristic of a cell, said method comprising: (a) transforming, intoa population of eukaryotic host cells, a DNA library encoding peptideshaving reduced structural flexibility due to covalent bonding of theamino and carboxy termini of said peptide to a conformation-constrainingprotein or disulfide bonding between cysteine residues at the amino andcarboxy termini of said peptide, there being at least 100 differentrecombinant molecules encoding different peptides in said population,each molecule being in at least one cell of said population; and (b)detecting the identifiable characteristic, wherein the identifiablecharacteristic is cell viability or a perturbation of cell cycleprogression.
 7. The method according to claim 6, wherein the DNAencoding the peptides is random.